Module 01 · Foundations

What is AutoGen?

Microsoft Research's framework for building multi-agent AI systems that can reason, collaborate, and execute code.

AutoGen is an open-source framework from Microsoft Research that lets you build systems where multiple AI agents work together to solve complex tasks. Think of it as a runtime for AI teamwork — agents can talk to each other, use tools, write and execute code, and ask humans for input.

Unlike workflow tools (n8n, Zapier) that execute deterministic steps, AutoGen agents reason autonomously. They decide how to solve a problem, not just follow a pre-written path.

Why does AutoGen exist?

🧩
Single LLM Limits
One LLM has a limited context window and can hallucinate. Multiple agents can verify each other, split tasks, and specialize.
🔁
Iterative Refinement
Agents can critique and improve each other's outputs — a critic agent reviewing a writer agent's code, for instance.
🛠️
Tool Execution
Agents can write Python, run it in a sandbox, see the result, fix bugs — closing the loop between planning and execution.
🧑‍💻
Human in the Loop
You control how much autonomy agents have. Inject human approval at any step, or let them run fully automated.

The Big Picture

Human / Task
UserProxy Agent
AssistantAgent
Code Executor
Result
💡
AutoGen v0.4 (AgentChat) is the current stable API. It introduced a cleaner async-first design with AssistantAgent, UserProxyAgent, and GroupChat. This course uses v0.4 patterns.

AutoGen vs. The World

FrameworkParadigmBest For
AutoGenAutonomous multi-agent conversationComplex reasoning, code generation, research tasks
LangGraphStateful graph-based workflowsFine-grained control over agent state & branching
CrewAIRole-based agent teamsBusiness automation with defined roles
n8nDeterministic workflow automationIntegrating SaaS tools with predictable logic
🧠 Quick Check: What fundamentally distinguishes AutoGen from n8n?
A AutoGen is newer and faster
B AutoGen agents reason autonomously; n8n executes deterministic steps
C n8n can't use AI; AutoGen can
D AutoGen only works with OpenAI models
Module 02 · Foundations

Core Concepts

The building blocks: agents, conversations, termination, and the LLM config pattern.

The Two Primary Agents

🤖
AssistantAgent
Powered by an LLM. Receives messages, reasons, and replies. It can suggest code, call functions, and generate plans. Doesn't execute code by default.
👤
UserProxyAgent
Represents a human or an executor. Can run code that the AssistantAgent produces, then feed results back. May prompt a real human for approval.

LLM Configuration

llm_config.py
import autogen
llm_config = {
    "config_list": [{ "model": "gpt-4o", "api_key": "sk-..." }],
    "temperature": 0.1,
    "cache_seed": 42,
}
config_list = autogen.config_list_from_json("OAI_CONFIG_LIST")

Conversations & Termination

termination.py
# 1. Max turns
user_proxy.initiate_chat(assistant, max_turns=5)
# 2. Keyword
assistant = autogen.AssistantAgent(system_message="...reply TERMINATE when done")
# 3. Custom function
def my_term(msg): return "task_complete" in (msg["content"] or "").lower()
user_proxy = autogen.UserProxyAgent(is_termination_msg=my_term)

Human Input Modes

ModeBehaviorUse Case
ALWAYSAsks human at every stepInteractive sessions, demos
TERMINATEAsks human only on terminationApproval gate at the end
NEVERFully autonomousProduction pipelines
⚠️
Code Execution Safety: Always use Docker sandbox or restricted paths in production. Never run untrusted agent code on bare metal.
🧠 Which agent actually runs Python code that the other agent writes?
A AssistantAgent
B UserProxyAgent
C Both equally
D Neither
Module 03 · Foundations

Your First Agents

Build a working two-agent system from 15 lines of Python.

Installation

terminal
pip install pyautogen
pip install pyautogen[docker]  # for Docker sandbox
export OPENAI_API_KEY="sk-..."

Hello World: Two-Agent System

hello_autogen.py
import autogen
llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "sk-..."}]}

assistant = autogen.AssistantAgent(
    name="assistant", llm_config=llm_config,
    system_message="You are a Python expert. When done, reply: TERMINATE"
)
user_proxy = autogen.UserProxyAgent(
    name="user_proxy", human_input_mode="NEVER",
    is_termination_msg=lambda x: "TERMINATE" in (x.get("content") or ""),
    code_execution_config={"work_dir": "coding", "use_docker": False}
)
user_proxy.initiate_chat(assistant, message="Print the first 10 Fibonacci numbers.")
No code block? If the reply contains no code, user_proxy sends "There is no code from the last message, provide the code." — automatically nudging the assistant.
🧠 What triggers the conversation to stop in the example above?
A A max_turns limit of 10
B Code executing successfully
C The assistant including "TERMINATE" in its reply
D AutoGen auto-detects task completion
Module 04 · Patterns

Conversation Patterns

Two-agent, sequential chaining, nested chats, Swarm — and when to use each.

Pattern 1: Two-Agent (Default)

One user_proxy, one assistant. Best for focused single tasks: code generation, Q&A, analysis.

Pattern 2: Sequential Chaining

sequential.py
r1 = user_proxy.initiate_chat(writer, message="Write a blog post about RAG")
r2 = user_proxy.initiate_chat(critic, message=f"Review:\n{r1.summary}")
r3 = user_proxy.initiate_chat(editor, message=f"Apply feedback:\n{r2.summary}")

Pattern 3: Nested Chats

nested.py
assistant.register_nested_chats(
    trigger=user_proxy,
    chat_queue=[{ "recipient": specialist, "summary_method": "last_msg", "max_turns": 3 }]
)

Pattern 4: Swarm (v0.4)

swarm.py
from autogen import SwarmAgent, initiate_swarm_chat
triage = SwarmAgent(name="triage", handoffs=["billing", "tech", "sales"])
initiate_swarm_chat(initial_agent=triage, agents=[triage, billing, tech, sales],
    messages="Can't access account after payment failed")
🗺️
Single task → Two-agent. Pipeline → Chaining. Sub-tasks → Nested. Dynamic routing → Swarm. Team collaboration → Group Chat.
Module 05 · Patterns

Tool Use & Function Calling

Give agents real-world capabilities: web search, database queries, API calls.

Defining Tools with Decorators

tools.py
@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Get current weather for a city")
def get_weather(city: str) -> str:
    return f"Weather in {city}: 22°C, sunny"

@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Search the web for current info")
def web_search(query: str) -> str:
    return f"Results for '{query}': ..."

How Tool Calling Works

Task
LLM decides to call
proxy executes fn
result injected
LLM uses result
🔧
Type hints matter — AutoGen uses them to generate JSON schema. Descriptions are prompts — the LLM reads them to decide when to call. Return strings — tools should return str or JSON-serializable values.
🧠 Why does the assistant need register_for_llm and user_proxy needs register_for_execution?
A Just boilerplate — identical under the hood
B LLM needs to know about the tool; executor actually runs it
C Both agents run it independently
D Only needed for GPT-4
Module 06 · Patterns

Group Chat

Orchestrate 3+ specialized agents collaborating on a shared task.

group_chat.py
planner = autogen.AssistantAgent(name="Planner", llm_config=llm_config,
    system_message="Break tasks into subtasks and assign them.")
coder = autogen.AssistantAgent(name="Coder", llm_config=llm_config,
    system_message="Write high-quality Python. No prose, just code.")
critic = autogen.AssistantAgent(name="Critic", llm_config=llm_config,
    system_message="Review code for bugs, edge cases, style.")

groupchat = autogen.GroupChat(
    agents=[user_proxy, planner, coder, critic],
    messages=[], max_round=12, speaker_selection_method="auto"
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(manager, message="Build a FastAPI sentiment endpoint")

Speaker Selection

MethodHowBest For
autoLLM picks most relevant agentGeneral purpose
round_robinAgents take turns in orderStructured loops
randomRandom each turnDiverse perspectives
custom fnYour function decidesComplex routing
⚠️
Token costs: Each agent sees the full history. Use max_round limits and concise system messages.
Module 07 · Production

Memory & RAG

Give agents long-term memory with vector stores and retrieval-augmented generation.

Built-in RAG: RetrieveUserProxyAgent

rag_agent.py
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
rag = RetrieveUserProxyAgent(name="rag", retrieve_config={
    "task": "qa", "docs_path": ["./docs/"],
    "model": "gpt-4o", "vector_db": "chroma",
    "collection_name": "my_docs", "get_or_create": True,
}, code_execution_config=False, human_input_mode="NEVER")
rag.initiate_chat(assistant, problem="What does our API return on auth failure?")

Memory Architecture Patterns

🔍
Vector DB (Semantic)
Embeddings of past conversations, docs, or facts. Retrieve by meaning. Best for knowledge bases, long-term recall.
🗄️
SQL DB (Structured)
Store structured facts: preferences, task history, entities. Best for user profiles, audit trails.
🕸️
Graph DB (Relational)
Model relationships between entities. Best for knowledge graphs, multi-hop reasoning.
In-Context (Short-term)
AutoGen manages conversation history automatically. Limited by token budget — summarize old turns.
🏗️
Production pattern: Vector DB for semantic recall + SQL for structured facts + in-context summarization for recent turns. Tri-layer architecture handles the full spectrum.
Module 08 · Production

AutoGen vs. Other Frameworks

When to use AutoGen, when to use something else, and how to combine them.

ScenarioBest PickWhy
AI writes & debugs code autonomouslyAutoGenCode execution loop + multi-agent review
Research: gather, analyze, synthesizeAutoGenAutonomous reasoning + tool use
Strict step-by-step workflowLangGraph / n8nDeterministic control flow
Role-based teams (PM, dev, QA)CrewAIFirst-class role/goal/task primitives
SaaS integration automationn8n500+ no-code connectors
Complex RAG + reasoningLlamaIndex + AutoGenBest of both
Strengths
• Autonomous code writing + execution
• Flexible conversation patterns
• Human-in-the-loop at any granularity
• Active research + rapid updates
⚠️
Weaknesses
• Less deterministic than graph tools
• Token costs escalate in group chats
• Multi-agent debugging is hard
• v0.4 API still maturing
🧠 Final: Route customer emails to billing/tech/sales using AI. Best combo?
A Pure AutoGen group chat
B Pure n8n with keyword filtering
C n8n triggers → AutoGen Swarm routes → n8n sends
D CrewAI with three agents
🎓
Course Complete — Now Build Something Real
You've covered all 8 modules. Time to apply everything in a full project — the Pizza Order Bot.
🍕
Hands-On Project
Pizza Order Bot
Multi-agent ordering system built with AutoGen v0.4 — applying everything from the course
Section 01 · Design

Architecture

Why multi-agent for a pizza bot? What each agent does. How this differs from Dialogflow CX or Amazon Lex.

You've already built this bot in Dialogflow CX (state machine + NLU) and Amazon Lex (slots + Lambda). The AutoGen version works completely differently — instead of a pre-wired state machine, you have agents that reason their way through the conversation.

No pages, no routes, no explicit state transitions. The agents decide what to ask, when to validate, and when to submit.

Dialogflow CX vs AutoGen — Same Bot, Different Soul

DimensionDialogflow CXAutoGen
Flow controlState machine (Pages → Routes)Agents reason and decide
Input handlingSlot filling + entity typesLLM extracts intent + entities
ValidationRegex on slot valuesValidator agent checks order object
FlexibilityRigid — changes need flow redesignHigh — change system prompt
DebuggingVisual flow trace in consoleAgent message log, print statements

System Architecture

👤 Customer
natural language
🤖 OrderAgent
AssistantAgent (LLM)
🛡️ ValidatorAgent
AssistantAgent (LLM)
🔧 UserProxyAgent
executes tools
get_menu()
calc_price()
validate_order()
submit_order()

Conversation Flow

Customer:"Large margherita, extra cheese, no olives"
OrderAgent:[calls get_menu()] → confirms items exist
OrderAgent:[calls calc_price(size="large", pizza="margherita", extras=["extra_cheese"])]
OrderAgent:[calls validate_order()] → ValidatorAgent returns "VALID"
OrderAgent→Customer:"Large Margherita, extra cheese, no olives — $20.00. Confirm?"
Customer:"Yes please"
UserProxy:[calls submit_order()] → ORD-48291
OrderAgent:"Order #ORD-48291 placed! Ready at 14:35. TERMINATE"
🍕
Key insight: The OrderAgent never follows a fixed script. It reasons about what's missing, asks naturally, and uses tools when ready. "I want a large, actually XL, pepperoni, and can I add mushrooms?" — handled naturally.
🤖
Hands-On Project
The Agents
Three agents, distinct roles, system prompts that are the soul of the system
Section 02 · Design

The Agents

Three agents with distinct roles. The soul of the system lives in the prompts.

🤖
OrderAgent — AssistantAgent
Customer-facing. Conducts the conversation, collects pizza details, calls tools for prices, passes a structured order to ValidatorAgent, then submits on confirmation.
🛡️
ValidatorAgent — AssistantAgent
Called as a tool. Receives order fields, checks pizza type, size, crust, toppings, and price plausibility. Returns VALID or INVALID: [issues].
🔧
UserProxy — UserProxyAgent
Executes all tools. human_input_mode=NEVER for automation, or uses smart_human_reply for interactive terminal sessions.

OrderAgent System Prompt (key excerpt)

ORDER_AGENT_PROMPT
Your job flow:
1. Greet the customer warmly
2. Call get_menu() once (silently) to know what's available
3. Collect: pizza type, size, crust (default: thin), extras, removals
4. Call calc_price() once you have all details
5. Call validate_order() — read VALID or INVALID response
6. If VALID → present summary + ask for confirmation
7. If INVALID → fix issues, recalculate, re-validate
8. On confirmation → call submit_order()
9. Reply with order ID + ETA, then: TERMINATE
💡
Why a separate ValidatorAgent? The OrderAgent focuses on conversation and can make small reasoning errors assembling the final order. The Validator is a second LLM pass with a narrow, strict task — catching issues before they reach the kitchen. Classic critic-agent pattern.
📁
Hands-On Project
Project Setup
File structure, installation, and environment config
Section 03 · Build

Project Setup

Install dependencies, create the project structure, configure your API key.

File Structure

pizza_bot/
  ├── main.py      # entry point — two modes: automated + interactive
  ├── agents.py    # agent definitions, prompts, tool registration
  ├── tools.py     # get_menu, calc_price, submit_order
  ├── menu.py      # pizza data — prices, toppings, crusts
  ├── config.py    # LLM_CONFIG from .env
  ├── .env         # OPENAI_API_KEY=sk-...
  └── requirements.txt

Installation

terminal
python -m venv .venv
source .venv/bin/activate
pip install pyautogen python-dotenv
config.py
import os
from dotenv import load_dotenv
load_dotenv()
LLM_CONFIG = {
    "config_list": [{"model": "gpt-4o", "api_key": os.getenv("OPENAI_API_KEY")}],
    "temperature": 0.2, "cache_seed": None,
}
🧱
Hands-On Project
Menu & Tools
The data layer and the tool functions agents call
Section 04 · Build

Menu & Tools

The pizza data and the four tool functions agents use to interact with it.

menu.py

menu.py
PIZZAS = {
    "margherita":  {"base_price": {"small":10,"medium":14,"large":18,"xl":22},
                    "default_toppings": ["mozzarella","tomato_sauce","basil"]},
    "pepperoni":   {"base_price": {"small":12,"medium":16,"large":20,"xl":24},
                    "default_toppings": ["mozzarella","tomato_sauce","pepperoni"]},
    "bbq_chicken": { ... },
    "veggie":      { ... },
}
EXTRA_TOPPINGS = {"extra_cheese":2.0, "mushroom":1.5, "bacon":2.5, ...}
CRUSTS = ["thin", "thick", "stuffed", "gluten_free"]
CRUST_UPCHARGE = {"stuffed": 3.0, "gluten_free": 2.0}

Key Tool Functions

tools.py
def get_menu() -> str:
    """Return full menu as JSON string."""
    return json.dumps({"pizzas": ..., "extra_toppings": ..., "crusts": ...})

def calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str:
    """Calculate total price. Returns JSON with total + breakdown."""
    total = PIZZAS[pizza_type]["base_price"][size] + CRUST_UPCHARGE.get(crust, 0)
    for t in (extra_toppings or []): total += EXTRA_TOPPINGS[t]
    return json.dumps({"total": round(total, 2), "breakdown": {...}})

def submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str:
    """Submit order to kitchen. Returns order_id + ETA."""
    order_id = f"ORD-{random.randint(10000,99999)}"
    # In production: write to DB, call kitchen API, send SMS...
    return json.dumps({"order_id": order_id, "eta": eta, "message": "Confirmed!"})
🤖
Hands-On Project
Order Agent
Tool registration — the key wiring pattern
Section 05 · Build

Order Agent

Creating the OrderAgent and registering tools with the two-decorator pattern.

agents.py — register_tools()
def register_tools(order_agent, validator_agent, user_proxy):

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Fetch the full pizza menu")
    def _get_menu() -> str: return get_menu()

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Calculate total price")
    def _calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str:
        return calc_price(pizza_type, size, crust, extra_toppings)

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Validate order before confirming with customer")
    def _validate_order(pizza_type: str, size: str, crust: str,
        extra_toppings: list, remove_toppings: list, price: float) -> str:
        # Spins up a one-shot ValidatorAgent chat, returns "VALID" or "INVALID: ..."
        order_str = json.dumps({"pizza_type": pizza_type, "size": size, ...})
        vproxy = autogen.UserProxyAgent(name="vp", human_input_mode="NEVER",
            is_termination_msg=lambda x: True, code_execution_config=False)
        vproxy.initiate_chat(validator_agent,
            message=f"Validate:\n{order_str}", max_turns=1, silent=True)
        history = validator_agent.chat_messages.get(vproxy, [])
        for m in reversed(history):
            if m["role"] == "assistant": return m["content"]

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Submit confirmed order. Call ONLY after customer says yes.")
    def _submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str:
        return submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name)
🔍
Why wrap the functions? The @ decorators must be applied at definition time in the function scope so AutoGen can capture correct references and bind them to the right agents.
Hands-On Project
Validator Agent
A strict second-pass LLM call that catches errors before the kitchen
Section 06 · Build

Validator Agent

Called as a tool by the OrderAgent — a one-shot LLM check on the assembled order.

VALIDATOR_PROMPT
You are a strict pizza order validator for PizzaLab.

Check ALL of the following:
1. pizza_type is one of: margherita, pepperoni, bbq_chicken, veggie
2. size is one of: small, medium, large, xl
3. crust is one of: thin, thick, stuffed, gluten_free
4. All extra_toppings are on the extras menu
5. Price is plausible for size + extras

Respond with EXACTLY one of:
- "VALID"
- "INVALID: [issue 1]; [issue 2]"

No extra text. Be terse.
⚠️
Learned lesson — don't use nested chats here. We tried register_nested_chats — it fired on every turn and couldn't reliably find the order JSON. The fix: validate_order() is a regular tool that spins up a one-shot initiate_chat internally. Explicit, deterministic, visible in the tool log.
🔗
Hands-On Project
Wiring It Together
main.py — two modes: automated test + smart interactive
Section 07 · Build

Wiring It Together

main.py assembles all agents and handles both automated testing and real interactive sessions cleanly.

main.py — interactive mode (the tricky part)
# KEY LESSON: human_input_mode="ALWAYS" prompts on EVERY turn
# including tool-call turns. Fix: use NEVER + a smart reply function.

def smart_human_reply(recipient, messages, sender, config):
    last = messages[-1] if messages else {}
    role = last.get("role", "")

    if role == "tool":          # tool result turn — pass through
        return False, None
    if last.get("tool_calls"):  # tool call turn — pass through
        return False, None       # ← CRITICAL: prevents 400 BadRequestError

    # Genuine conversational turn — ask the human
    human_input = input("You: ").strip()
    if human_input.lower() == "exit": return True, "exit"
    return True, human_input

user_proxy = autogen.UserProxyAgent(name="customer", human_input_mode="NEVER", ...)
user_proxy.register_reply(trigger=[autogen.AssistantAgent, None],
    reply_func=smart_human_reply, position=0)

Three Bugs We Fixed Building This

BugRoot CauseFix
Infinite validation loopregister_nested_chats fires on every turnvalidate_order() as a regular tool
TypeError: NoneType not iterablecontent=None on tool-call messages(x.get("content") or "")
400 BadRequestErrorEmpty reply inserted between tool_call and tool_resultGuard on last.get("tool_calls")
🧪
Hands-On Project
Running & Testing
Two modes, expected output, edge case test matrix
Section 08 · Production

Running & Testing

Two ways to run the bot. What to expect. What to test.

Run Commands

terminal
# Automated test (scripted replies, no typing needed)
python main.py

# Interactive mode (you type each customer reply)
INTERACTIVE=1 python main.py

Interactive Session Preview

OrderAgent:Welcome to PizzaLab! What can I get for you today?
You:I'd like a margherita pizza
OrderAgent:Great choice! What size — small, medium, large, or XL?
You:Large please
OrderAgent:Thin, thick, stuffed, or gluten-free crust?
You:Stuffed crust, and add extra cheese
OrderAgent:Large margherita, stuffed crust, extra cheese — $23.00. Shall I place this order?
You:Yes!
UserProxy:[submit_order() → ORD-58291]
OrderAgent:Order #ORD-58291 confirmed! Ready at 14:35. TERMINATE

Edge Cases to Test

InputExpected
"I want a Hawaiian pizza"Agent says not on menu, suggests alternatives
"XL margherita stuffed crust + pineapple + bacon"Prices stuffed upcharge ($3) + 2 extras ($4) correctly
"Actually change it to medium"Recalculates from scratch
"No" to confirmationAgent asks what to change, doesn't submit
⬆️
Hands-On Project
Extensions
Voice, memory, multi-order, Azure deployment, MCP
Section 09 · Production

Extensions

Where to take this next — production-grade enhancements.

🎤
Real-Time Voice
Azure ACS Call Automation + Azure OpenAI Realtime API. OrderAgent becomes a phone voice agent. Same logic, different I/O layer.
🧠
Order History Memory
Add get_past_orders(customer_id) tool backed by ChromaDB. Agent can offer "Same as last time?"
👥
Group Order
GroupChat with a CustomerAgent per person + one OrderCoordinatorAgent. Handles "ordering for 4 people" naturally.
☁️
Azure Deployment
Wrap main() in an Azure Function (HTTP trigger). Deploy to Container Apps. Swap OpenAI for Azure OpenAI endpoint.
📊
Observability
Add LangFuse or OpenTelemetry. Trace every agent turn, tool call, and token count. Essential for catching LLM drift in production.
🔌
MCP Tools
Replace manual tool registration with an MCP server. One server, plug into any AutoGen, LangGraph, or Claude agent.
🎓🍕
Course + Project Complete
You've covered all 8 AutoGen modules and built a working multi-agent pizza ordering system — with tool use, validation, interactive mode, and three real bugs debugged. The same pattern scales to any transactional chatbot domain.